The Vertebrate Breed Ontology: Towards Effective Breed Data Standardization

Background – Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes application of existing information-based tools to support advancement in veterinary diagnostics, treatments, and precision medicine. Hypothesis/Objectives – Creation of a Vertebrate Breed Ontology (VBO) as a single, coherent logic-based standard for documenting breed names in animal health, production and research-related records will improve data use capabilities in veterinary and comparative medicine. Animals – No live animals were used in this study. Methods – A list of breed names and related information was compiled from relevant sources, organizations, communities, and experts using manual and computational approaches to create VBO. Each breed is represented by a VBO term that includes all provenance and the breed’s related information as metadata. VBO terms are classified using description logic to allow computational applications and Artificial Intelligence-readiness. Results – VBO is an open, community-driven ontology representing over 19,000 livestock and companion animal breeds covering 41 species. Breeds are classified based on community and expert conventions (e.g., horse breed, cattle breed). This classification is supported by relations to the breeds’ genus and species indicated by NCBI Taxonomy terms. Relationships between VBO terms, e.g. relating breeds to their foundation stock, provide additional context to support advanced data analytics. VBO term metadata includes common names and synonyms, breed identifiers/codes, and attributed cross-references to other databases. Conclusion and clinical importance – Veterinary data interoperability and computability can be enhanced by the adoption of VBO as a source of standard breed names in databases and veterinary electronic health records.


Introduction
2][3][4][5][6][7] The success of individualized patient care relies on the availability of data, including molecular mechanisms of disease, genomic profiling, and pharmacogenomics, 3,6 available from research databases and health records.However, non-human animal and veterinary data are rarely represented in databases and electronic health records (EHR) using universally accepted standards, making computability and integration of these data challenging.Standards are critical for data integration, as they ensure that data is comparable and consistent across sources by referring to similar concepts.For example, by using an NCBI gene identifier (ID) to represent a "gene", a data entry is unambiguous, and related information referring to the same ID can confidently be combined and compared.Standard terminologies and data models have been widely accepted in model organism and human databases and EHR, fostering data interoperability, integration, and comparison; therefore supporting precision medicine applications.Ontologies have been accepted as gold standard terminologies: not only do ontology terms represent clearly defined concepts including synonyms and other rich metadata, but also, ontologies include computable relationships between terms within the same ontology and across resources, providing increased context and interoperability with a broad range of data for use during analytics. 8eed name information is often embedded in free text notes, or flat lists in, for example, in practice management software.These lists vary between data sources and rarely connect with each other (e.g.there is rarely any indication that "German Shepherd", "German Shepherd Dog", "Alsatian", and "Deutscher Schäferhund" refer to the same dog breed concept).Some breed name standards exist, e.g., Systematized Nomenclature of Medicine -Clinical Terms (SNOMED CT), The Livestock Breed Ontology (LBO, https://www.animalgenome.org/bioinfo/projects/lbo/), and the breed-name component of Veterinary Nomenclature (VeNom) Codes (https://venomcoding.org/).However, these standards are either limited in scope (e.g.LBO is limited to livestock breeds), impose restricted licenses, or lack an ontological foundation, making these existing standards unsuitable for a wide range of uses from animal husbandry to companion animal veterinary medicine data.An open-source standard that reconciles breed names, supporting information, and their provenance is needed globally to ensure data interoperability needed for precision medicine, learning healthcare, and to inform care guidelines and breeding best practices.
Here we introduce the Vertebrate Breed Ontology (VBO) as an open, comprehensive source for breed names and metadata across all vertebrate animals, including livestock and companion animals.VBO provides linkouts to resources and supporting information, and is actively maintained and continually enhanced, providing a powerful tool for breed-related data interoperability.We describe the process of creating VBO, its maintenance by the community, and how the use of VBO can support breed-information data interoperability, integration, and precision veterinary medicine.

Sources from breeds and related information:
Through active, collaborative engagements with international organizations, communities, and experts, we gathered lists of breeds and related breed information from relevant sources.A full list of these sources can be found in the VBO documentation. 9Each breed source has specific goals; for example, the Food and Agricultural Organization (FAO) Domestic Animal Diversity Information System (DAD-IS) 10 aims to support breed conservation around the world, while canine and feline registration bodies focus on breed standards and documentation.Information included in breed lists is specific to the sources and can be contradictory.We included all information without discrimination and ensured provenance and attribution to these sources, as well as any decisions required to mitigate conflicts or discrepancies.

Creation of the VBO content
We manually reviewed and curated the breed lists to (1) group information related to the same breed under the same VBO term, (2) create VBO term names based on the most commonly used breed name and species, ensuring term label uniqueness (see VBO documentation 9 ), and (3) map breeds to their corresponding NCBI Taxonomy (NCBITaxon) record (representing the species).VBO terms were integrated within the NCBITaxon hierarchy using the is_a relation.Relations between breeds (e.g.indicating the breed's foundation stock) were also manually curated based on breed information gathered by sources.
To facilitate ontology browsing and use, we created high-level grouping terms such as 'dog breed', 'cattle breed',... that were logically defined based on their NCBITaxon parentage (species, genus, or family).A description logic reasoner was leveraged to automatically classify VBO terms under these high-level terms.Rich metadata and cross-references to other terminologies and databases, including their provenance, were recorded for each VBO term.

Creation of VBO using the Ontology Development Kit (ODK)
The Ontology Development Kit (ODK) 11 provides a framework for creating ontologies, including both executable workflows for managing ontologies, such as release workflow and continuous integration, as well as ontology-processing tools such as ROBOT. 12The ODK is used to automatically check the VBO for errors whenever changes are proposed (e.g.new classes are added), and to release new ontology versions.VBO is managed and openly available on GitHub at https://github.com/monarch-initiative/vertebrate-breed-ontology.VBO has been accepted into the Open Biological and Biomedical Ontology (OBO) Foundry (https://obofoundry.org/). 13

VBO maintenance
Most external breed sources do not have unique and permanent identifiers that allow for a robust automated workflow.Therefore, VBO is currently mostly maintained based on user reviews and requests for changes or additions of new breeds submitted to the VBO GitHub repository (https://github.com/monarch-initiative/vertebrate-breed-ontology/issues).
More information about ontology content and maintenance can be found in Table 1 including minimum information for the reporting of an ontology (MIRO). 14

Results
VBO as a standard for breeds and breed information VBO was created as a single source for vertebrate animal breeds and related information.Since the concept of "breed" is not clearly defined between communities, we took a broad approach to defining 'vertebrate breed': "a group of animals that share specific characteristics (such as traits, behavior, genetics) that distinguish it from other organisms of the same species, and/or for which cultural or geographical separation has led to the general acceptance of its separate identity.Breeds are formed through genetic isolation and either natural adaptation to the environment or selective breeding, or a combination of the two."We created breed concepts/VBO terms when they were officially recognized by an international breed organization, or when other groups or communities identify groups of animals as a breed.
For example, 'Dog breed' (VBO:0400024) is defined at the species level as a "vertebrate breed of the taxon Canis lupus familiaris."This definition excludes wild species such as Canis rufus (red wolf), Canis latrans (coyote) and Canis lupus (gray wolf).In contrast, 'Cattle breed' (VBO:0400020) is named based on the name used in agriculture, and defined at the genus level as a "vertebrate breed of the taxon Bos." 'Cattle breed' groups classes for all breeds of Bos (NCBITaxon:9903) including Bos indicus (zebu cattle, NCBITaxon:9915), Bos taurus (cattle, NCBITaxon:9913) and Bos indicus × Bos taurus (hybrid cattle, NCBITaxon:30522), and Bos grunniens (yak, NCBITaxon:30521) following the acceptance of the animal science and veterinary communities. 10,15,16Since breeds are identified as distinctive groups within a family, genus or species, breeds are represented in the ontology as a subclass of these, themselves represented by NCBITaxon entities.VBO is therefore integrated within the NCBITaxon hierarchy (Figure 1B and 2).  2.
Each term in VBO is identified by a unique and permanent ID (Table 1).Breeds from different species often share the same name, for example, "Abyssinian" is the name for a breed of horse, cat, and donkey.In addition, some breeds are commonly called by names that can represent other types of entities.For example "Cyprus" is used to refer to the name for a breed of cat, cattle, donkey, and goat but also to the country "Cyprus" (see section "Breeds originating from DAD-IS").To ensure that all term names were unique, we created term labels by concatenating the breed's most common name and their species common name, following the format: 'Most common name (Species)', in which Most common name and Species are the English language names (e.g.'Cyprus (Cat)').8][19] For example, 'Chihuahua, Long-Haired (Dog)' (VBO:0200339) and 'Chihuahua, Smooth-Haired (Dog)' (VBO:0200340) are subclasses of 'Chihuahua (Dog)' (VBO:0200338, Figure 1B).
Term metadata and provenance are provided for each VBO term.Metadata fields, definitions, and examples are provided in the VBO documentation. 9Required term metadata include the most common name (a synonym tagged to indicate the name by which a breed is most commonly referred to), source (indicating provenance of the information), and contributor (Open Researcher and Contributor ID (ORCID) 20 of curators and experts who contributed to the creation/revision of a VBO term) (Figure 3).Provenance for this metadata is also recorded as "source".
Additional metadata such as other synonym(s), database cross reference, breed codes, recognition status, domestication status, extinction status, and description of origin are included when available.Information from different sources might be discordant, for example the breed recognition by registration bodies.We chose to not be the arbitrator of breed information, and instead record all information, relying on provenance (ie.source annotations) to guide users.For example the 'Australian Mist (Cat)' (VBO:0100034) is a 'fully recognized breed' of the Governing Council of the Cat Fancy (GCCF), Rare and Exotic Feline Registry (REFR), The International Cat Association (TICA) and the World Cat Federation (WCF) and a 'not recognized breed' of the Fédération Internationale Féline (FIFe) (Figure 3).

Breed Foundation Stock
Breeds are sometimes created by crossing other breeds whose traits and/or pedigrees are desirable.For example 'Himalayan (Cat)' (VBO:0100117) was created from a cross of individuals from 'Siamese (Cat)' (VBO:0100221) and 'Persian (Cat)' (VBO:0100188) (Figure 4).These animals that are the progenitors, or foundation, of a breed are called "foundation stock." 21hey provide part of the underlying genetic base for a new distinct population.VBO provides information about breeds' foundation stock by using the has_foundation_stock relation.This relation is defined as "a relation between two distinct material entities (breeds or species), a descendant entity and an ancestor entity, in which the descendant entity is the result of mating, manipulation, or geographical or cultural isolation of the ancestor entity, therefore inheriting some of the ancestor's genetic material."It should be noted that foundation stock could be one or more other breeds.2.

Breeds originating from DAD-IS FAO compiles and maintains a list of breeds reported by country-nominated National
Coordinators from 182 countries.The goal of this DAD-IS breed list is the management of animal genetic resources, focusing on diversity of livestock breeds on national, regional and global levels including the status of breeds regarding their risk of extinction.DAD-IS includes specific information related to its goals 10 , and therefore, the corresponding breeds in VBO have unique associated metadata and semantic information representing this information, such as domestication status and extinction status.
Most breeds in DAD-IS represent breeds that "exist in a specific country" as reported by National Coordinators. 22This concept, specific to DAD-IS, is represented in VBO using the 'located_in some Country' axiom indicating the country where the breed was reported, using a Wikidata entry. 23In addition, to ensure term label uniqueness, the naming conventions for these breeds follow the format: 'Most common name, Country (Species)', in which Country and Species are the English language names (Figure 2).
It is important to note that this concept of "breed that exists in a specific country" is unique to DAD-IS and its goals.While VBO users should be aware of this concept, it will rarely be used in other contexts.

Community work
Though maintained by the Monarch Initiative 24 , VBO is a community resource that involves the participation of the community at large: anyone can request changes or new breeds to be added to the ontology through the GitHub Issue Tracker (https://github.com/monarch-initiative/vertebrate-breed-ontology/issues).

Discussion:
VBO is a unique, open, community-driven ontology for vertebrate animal breeds, covering a broad scope of animals including livestock and companion animals, and encompassing all breeds as defined by and in the context of international organizations and communities.Ontology modeling decisions were made based on use cases.However, these have a few consequences that we address in the Supplemental Discussion.VBO is a standard for breed terms, which supports data disambiguation and integration.Its hierarchical classification of concepts, and defined relationships between concepts allow computational logical reasoning 25,26 which can be leveraged in predictive tools.For instance, VBO supports the construction of veterinary clinical decision assistance tools that provide information about disease susceptibility in breeds, and precision medicine tools to identify optimal treatments for individual animals.In addition, VBO can be leveraged for cross-species translational research and work in the field of conservation medicine. 27same breed can be referenced using different names across veterinary databases and scientific reports, since no universal standard has yet been adopted.Using VBO ID in these databases and reports disambiguates breed-related data. 28,29For example, the information in Online Mendelian Inheritance in Animals (OMIA) has been rendered more interoperable by using VBO terms to specify breeds in which a trait or disorder and/or a likely causal variant has been documented. 30sease prediction tools are augmented by leveraging the computational logical reasoning of VBO, for example, in the context of disease susceptibility in specific breeds.2][33] The axioms in VBO indicate that 'Persian (Cat)' (VBO:0100188) is a foundation stock for 'Exotic Shorthair (Cat)' (VBO:0100096), which itself is a foundation stock for 'Foldex (Cat)' (VBO:0100099) (Figure 3).Based on these relationships, one could theorize that 'Exotic Shorthair (Cat)' and 'Foldex (Cat)' could also be more susceptible to ADPKD.2][33] Similar predictions could guide disease predictions and treatment discovery that might be appropriate for some breeds but not others of the same species.
We relied on external sources, such as international organizations, communities, and experts, to determine whether a term should be added to VBO.However, these external sources have specific purposes (e.g breed competitions, breed diversity), there are often disagreements on whether a group of animals should be recognized as a breed.For example, 'Plott Hound (Dog)' (VBO:0201023) is an American Kennel Club (AKC) recognized breed, but is not included in the breed list of the Fédération Cynologique Internationale (FCI).Similarly, VeNom 34 lists 'Labradoodle (Dog)' (VBO:0200798) as a breed although it is not included in breed lists of canine registration bodies such as the AKC, FCI, and United Kennel Club.We took an inclusive approach and created a VBO term when "any" sources considered a breed, making VBO relevant to a broad range of use cases.We trust that, by recording the provenance of all pieces of information, users will be able to decide whether or not to include a particular VBO term for their application.
Many animal breeds have been historically based on conformation standards including structure and appearance (e.g.coat color, hair length, size), as demonstrated by reports and text descriptions in international breed organizations and breed references. 35Advancement in genetics brings new breed information and in some cases questions the validity of the ancestry of these breeds. 28,36In addition, the discovery of genetic components associated with trait aspects (e.g.variations determining cat tail length 37 and the effect of genotype on performance in racing 'Standardbred (Horse)' (VBO:0000899) and 'Scandinavian Coldblood Trotter (Horse)' (VBO:0017173) 38 ) are changing how breeds are defined and how individuals are selected for breeding.This disparity between breeds defined by traits versus genetics can have a big impact on the veterinary data.For example, treatment efficacy might be affected by genetic factors.Therefore, predicting treatment efficacy between breeds would be more accurate if breeds are related to each other based on genetics, and not based on traits.The introduction of relationships and classifications in VBO to specify how breeds are related (i.e.genetically versus phenotypically) will augment VBO's potential.
Veterinary EHR is the ideal data source where VBO should be implemented, as it directly interacts with clinical decision-support tools.An achievable first step would be annotating research artifacts, including journal articles and datasets with VBO IDs.This would aid in collating studies performed in animals of the same breed for the purpose of data integration in systematic reviews and meta-analyses, helping to overcome the problem that prospective veterinary studies are performed on small numbers of animals. 39O is built by the community for the community and is a blueprint and first step towards achieving data harmonization in veterinary medicine.Breed lists were collected from international breed organizations and communities.These lists were manually curated and integrated such that the same breed concept and its information was represented by a single VBO term.Term classification based on species (NCBITaxon) was also done manually.

B.1 Need
Single source for breed name and related information, representing a broad range of species and breeds, open access, and including provenance for the information.

B.2 Competition
Livestock Breed Ontology (LBO) is a resource for livestock breed.LBO is, however, limited to livestock breeds, and does not include companion animals (such as cat and dog breeds).In addition, many new livestock breeds (e.g.some breeds reported in DAD-IS) are also out of scope in VBO.
-Any sources containing breed information; for example veterinary EHR -Publications, in order to disambiguate breeds and enable data curation and integration.

C. Scope, requirements, development community (SRD)
C.1 Scope and coverage All vertebrate breeds, including sub-breeds, varieties, strains, etc C.2 Development community The content of the ontology was initially created based on international breed organization and communities lists.
Additional VBO terms and breed information are driven by user requests and new available (either discovered or not yet included) breed sources.

D.1 Knowledge acquisition method
Breed lists were collected from international breed organizations and communities and manually curated, with consultation with animal experts.Review and verification happens with targetted reviews involving experts, and via user requests.

D.2 Source knowledge location
Sources where the breed knowledge was gathered can be found here: https://monarch-initiative.github.io/vertebrate-breed-ontology/general/general/ D.3 Content selection User requests for new breeds and update to the existing ontology are given priority.
While synchronization of information with the original sources, as well as the addition of new breed sources are also of high priority, these are addressed on a per available ontology editor resources basis.

E.1 Knowledge Representation language
The OWL language is used as it is more expressive and allows axioms such as "located_in value [Wikidata ID for country]", which are necessary for the majority of the breeds from the DAD-IS list.
.obo and .jsonversions of the ontology are also available, however, these formats do not include all information included in the .owlformat (e.g."located_in some country" axioms)

E.2 Development environment
The

F.2 Entity deprecation strategy
Terms deprecation happens when: -A VBO term represents a concept that never existed (i.e.created by mistake).In this case, the term is obsoleted.
-Their IDs is maintained with the annotation 'owl:deprecated': true.
-The obsoletion reason is recorded using the term 'domain entity does not exist'.-A link to the issue tracker discussing this obsoletion is recorded using 'term tracker item' (IAO:0000233) annotation -VBO term represents the same breed concept as another VBO term (i.e.concepts are duplicated).In this case, the terms are merged.
-The IDs of the merged term, ie that is obsoleted, is maintained with the annotation 'owl:deprecated': true.-The obsoletion reason is recorded using the term 'term merged'.
-The annotation "replaced by" indicate the VBO ID of the term into which it was merged -A link to the issue tracker discussing this merge is recorded using 'term tracker item' (IAO:0000233) annotation

Figure 1 :
Figure 1: Classification of vertebrate breeds in VBO.(A) High-level classification based on species (e.g.'Dog breed') and community usage (e.g.'Cattle breed").(B) VBO representation of Chihuahua dog breeds in VBO.'Chihuahua (dog)' is a subclass of 'Dog breed', itself a subclass of 'Vertebrate Breed' and 'Canis lupus familiaris'.'Chihuahua, Long-Haired (Dog)' and 'Chihuahua, Smooth-Haired (Dog)' are subclasses of 'Chihuahua (dog)', since they are more specific instances of 'Chihuahua (dog)'.Term from NCBITaxon hierarchy is shown in thick border box.Arrows represent is_a relation.Some relations and VBO terms are not displayed in this figure for clarity.All VBO and NCBITaxon IDs are reported in Table2.

Figure 2 :
Figure 2: Classification of bovine breeds.(A) Relation between selected cattle breeds and their NCBITaxon species of Bos taurus, Bos indicus, and Bos grunniens.Each VBO term is related to an NCBITaxon.Breeds defined as having been reported in a specific country by National Coordinators in DAD-IS are either direct subclasses of their corresponding NCBITaxon or subclasses of other breeds (and therefore inherit the NCBITaxon subclass).Direct subclasses on NCBITaxon shown in this figure are 'Guraghe, Ethiopia (Cattle)' and 'Eastern Yak, Bhutan (Yak (domestic))'.Subclasses of other breeds shown in this figure are: 'Aberdeen Angus, Brazil (Cattle)' and 'Aberdeen Angus, Ireland (Cattle)', subclasses of 'Aberdeen-Angus (Cattle)', and 'Zebu, Guyana (Cattle)' and 'Zebu, Australia (Cattle)', subclasses of 'Zebu (Cattle)'.(B) 'Cattle Breed' is defined as "Vertebrate breed of the taxon Bos", and is, therefore, a subclass of Bos.Bos encompasses Bos taurus, Bos indicus, and Bos grunniens.As a consequence, all breeds of these NCBITaxon species (as shown in A) are classified under 'Cattle Breed'.Similarly, 'Bovine Breed' being defined as "Vertebrate breed of the taxon Bovinae" encompasses 'Cattle Breed' (of taxon Bos), 'American Bison Breed' (of taxon Bison bison), and 'Buffalo Breed' (of taxon Bubalus bubalis), since Bos, Bison bison and Bubalus bubalis are subclasses of Bovinae (not shown).Terms from NCBITaxon hierarchy are shown in thick border boxes.Arrows represent is_a relation.Some relations and VBO terms are not displayed in this figure for clarity.All VBO and NCBITaxon IDs are reported in Table2.

Figure 3 :
Figure 3: Examples of metadata for 'Australian Mist (Cat)' (VBO:0100034).(A)"has exact synonym", "Australian Mist", with the synonym type "most common name" indicate the name most often used to refer to the VBO term.The provenance for this information is shown as "source" annotations.(B) Two "breed codes" are used to refer to this VBO term: AMD and ALM.The corresponding "source" indicates which organization uses each code.(C) contributors who participated in the creation of this term are recorded as metadata using ORCID.(D) Recognition by registration bodies is recorded in "breed recognition status".Provenance for this metadata is also recorded as "source".

Figure 4 :
Figure 4: Breeds are related to their progenitors via the has_foundation_stock relation.'Himalayan (Cat)' has progenitors 'Siamese (Cat)' and 'Persian (Cat)'; therefore this breed is related to them via the has_foundation_stock relation.Due to the transitive property of the has_foundation_stock relation, terms inherit the has_foundation_stock from their progenitor(s).For example, 'Foldex (Cat)' has_foundation_stock 'Exotic Shorthair (Cat)', which itself has_foundation_stock 'Persian (Cat)'.Therefore, it can be inferred that 'Foldex (Cat)' has_foundation_stock 'Persian (Cat)' (thick hashed arrow).Full arrows represent is_a relation; hashed arrows represent has_foundation_stock relation.Some relations and VBO terms are not displayed in this figure for clarity.All VBO IDs are reported in Table2.

Table 1 : Minimum Information for the Reporting of an Ontology (MIRO) for VBO.
14is table was created based on the MIRO guidelines as described in Matentzoglu et al.14 Ontology Development Kit (ODK, https://github.com/INCATools/ontology-development-kit)was used to create and is used to maintain VBO.
TestingWe use the general QA/QC included in the ODK framework.In addition, we are working on creating new QC tests specific to VBO.
F.3 Versioning policyWe aim to release a new ontology version every month.G.Quality Assurance (QA)G.1

Table 2 : List of identifiers reported in this publication.
Note that the category "VBO term-breed" includes sub-breed, variety, etc.